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Research in a number of fields now requires the analysis of "multi- 
block" data, in which multiple high-dimensional, and fundamentally 
disparate, datatypes are available for a common set of objects. In this 
paper we introduce Joint and Individual Variation Explained (JIVE), 
a general method for the integrated analysis of multi-block data. The 
method decomposes a multi-block dataset into a sum of three terms: 
a low-rank approximation capturing joint variation across datatypes, 
low-rank approximations for structured variation individual to each 
datatype, and residual noise. This decomposition can be used to 
quantify the amount of joint variation between datatypes, visually 
explore joint and individual structure, and reduce the dimensionality 
of the data. The proposed method represents an extension of Princi- 
pal Component Analysis (PCA) and has clear advantages over popu- 
lar two-block methods such as Canonical Correlation Analysis (CCA) 
and Partial Least Squares (PLS). We apply JIVE to data from The 
Cancer Genome Atlas (TCGA), where multiple genomic technologies 
are available for a common set of Glioblastoma Multiforme tumor 
samples. 

Software is available at https : //genome . unc . edu/ j ive/. 

1. Introduction. Many fields of scientific research now involve the 
analysis of high-dimensional data, in which a large number of variables are 
measured for a given set of experimental objects. In particular, there is an 
increasing prevalence of multi-block data, in which multiple fundamentally 
disparate high-dimensional datasets are available for a common set of ob- 
jects. In this paper we introduce JIVE, a general method for the integrated 
analysis of multi-block data. 

While the JIVE method is broadly applicable, we focus on the analysis 
of biological data. In biomedical studies, a number of existing technologies 
may be used to collect diverse information on an organism or tissue sam- 
ple. The amount of available biological data from multiple platforms and 
technologies is expanding rapidly. The 2011 Online Database collection of 
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Nucleic Acids Research lists 1330 publicly available databases that mea- 
sure different aspects of molecular and cell biology [Galberin and Cochrane, 
2011]. Large online databases such as ArrayExpress [Parkinson et al., 2009] 
and the UCSC Genome-browser [Rhead et al., 2010] often contain multiple 
disparate datatypes collected from a common set of samples. Large-scale 
projects like The Human Connectome Project [Sporns et al., 2005] and The 
Cancer Genome Atlas [TCGA Research Network, 2008] focus on the inte- 
grated analysis of multi-block data. 

Well established multivariate methods can be used to separately analyze 
different datatypes measured on the same set of objects. However, individual 
analysis of each datatype will not capture associations and potential causal 
relationships between datatypes. Furthermore, each datatype can impart 
unique and useful information. There is a need for new methods that explore 
associations between multiple datatypes and combine data from multiple 
disparate sources when making inference about the objects. This motivates 
an exciting new area of statistical research. 

The JIVE method decomposes multi-block data into a sum of three com- 
ponents: a low rank approximation capturing joint structure between datatypes, 
low rank approximations capturing structure individual to each datatype, 
and residual noise. Analysis of individual structure provides a way to identify 
potentially useful information that exists in one datatype, but not others. 
Accounting for individual structure also allows for more accurate estimation 
of what is common between datatypes. JIVE can identify joint structure not 
found by existing methods, which are described in Section 1.2. Furthermore, 
JIVE is robust to the dimensionality of the data, applicable to more than 
two datatypes, and has a simple algebraic interpretation. 

In Section 1.1 we formally introduce multi-block data, and in Section 1.2 
we describe related existing methods for the integrated analysis of multiple 
datatypes. In Section 2 we give a detailed description of the JIVE method. 
In Section 3, we motivate and validate the JIVE method through a variety 
of simulated examples. 

Section 4 describes an application of JIVE to multi-block data on Glioblas- 
toma Multiforme (GBM) tumor samples from TCGA, an ongoing collabo- 
rative effort funded by the National Cancer Institute (NCI) and the Na- 
tional Human Genome Research Institute (NHGRI). A goal of TCGA is 
to molecularly characterize cancer through analysis and integration of mul- 
tidimensional large scale genomic data [TCGA Research Network, 2008]. 
Information from disparate genomic datatypes can be integrated for a more 
comprehensive understanding of cancer genetics and cell biology. In addition, 
we would like to find distinguishing characteristics between tumor samples. 
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either across multiple datatypes or unique to a single datatype, that may be 
used to identify targeted therapies. 



1.1. Multi-block data. Formally, a multi-block dataset consists of matri- 
ces Xi,X2, . . . with k > 2. Each matrix has n columns, corresponding 
to a common set of n objects. The zth matrix Xi has pi rows, each corre- 
sponding to a variable in a given measurement technology that varies from 
matrix to matrix. To give an example, for a given set of n biological samples, 
the rows of Xi might contain gene expression measurements (of dimension 
pi), the rows of X2 might contain genotype information, and the rows of X3 
might contain the concentration of different metabolites. The data matrices 
in a multi-block data set may be unified vertically into a single data matrix 



X 



: p X 



where p = + ... + Pk- 

Direct analysis of X can be problematic as the size and scale of the con- 
stituent dataypes are often significantly different. To remove baseline differ- 
ences between datatypes, it is helpful to row-center the data by subtracting 
the mean within each row. Datatypes may also be of different dimension 
(pi) or differ in variability. To circumvent cases where "the largest dataset 
wins", we scale each datatype by its total variation, or sum-of-squares. In 
particular, for each i define X^^^^^^ = where || • || defines the Frobenious 

norm = Then, HXI^^^^^^H = 1 for each z, and each datatype 

contributes equally to the total variation of the aggregated matrix 



(1.1) 



X 



scaled 



j^scaled 



j^scaled 



1.2. Existing methods. One approach to the analysis of multi-block data 
is to mine the data for variable- by- variable associations. In computational 
biology, genome wide association and expression quantitative trait loci stud- 
ies [Gilad et al., 2008] can identify millions of pairwise variable associations 
between genomic datatypes. Furthermore, network models can link asso- 
ciated variables across and within datatypes (see Adourian et al. [2008]). 
However, analysis of variable-by-variable associations alone does not iden- 
tify the global modes of variation that drive associations across and within 
datatypes, which is the focus of this paper. 
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Principal Component Analysis (PCA) of the block-scaled matrix X^^^^^ 
coincides with Consensus PCA [Westerhuis et al., 1998, Wold et al., 1996]. 
This direct approach using the aggregated data matrix is also utilized by the 
iCluster method [Shen et al., 2009]. Designed to cluster samples based on 
information from multiple genomic datatypes, iCluster performs clustering 
based on a factor analysis of the aggregated matrix X. While these meth- 
ods synthesize information from multiple datatypes, they do not distinguish 
between joint or individual effects. 

Canonical Correlation Analysis (CCA) [Hotelhng, 1936] is a popular method 
to globally examine the relationship between two sets of variables. If Xi 
and X2 are two datasets on a common set of samples, the first pair of 
canonical loadings (variable weights) ui and U2 are unit vectors maximizing 



Geometrically, ui and U2 can be interpreted as the pair of directions that 
maximize the correlation between Xi and X2. Sample projections on the 
canonical loadings, uJXi and U2X2, give the canonical scores for Xi and X2. 
Subsequent CCA directions can be found by enforcing orthogonality with 
previous directions. For datasets with pi > n or p2 > n the CCA directions 
are not well defined, and over-fitting is often a problem even whenpi,p2 < ^• 
Hence, standard CCA is typically not applicable to high-dimensional data. 

Partial Least Squares (PLS) [Wold, 1985] directions are defined similarly 
to CCA, but maximize covariance rather than correlation. PLS is appro- 
priate for high-dimensional data. However, Trygg and Wold [2003] examine 
how structured variation in Xi not associated with X2 (and vice- versa) can 
drastically alter PLS scores, making the interpretation of such scores prob- 
lematic. Their solution, called 02-PLS, seeks to remove structured variation 
in Xi not linearly correlated with X2 (and vice versa) from the PLS com- 
ponents. As such, 02-PLS components are often more representative of the 
true joint structure between two datatypes. However, the restriction of 02- 
PLS (and PLS) to pairwise comparisons limits their utilty in finding common 
structure among more than two datatypes. 

Witten and Tibshirani [2009] recently introduced Multiple Canonical Cor- 
relation Analysis (mCCA) to explore associations and common structure on 
two or more datasets. For Xi, X2, X/^ as in Section 1.1, each row centered 
and row standardized, the standard mCCA loading vectors ui,U2, --.^Uk sat- 
isfy 



Corr(ix^Xi, U2X2). 



argmax 



J2CoY{ufXi,uJXj). 



i<j 



i<j 



As such, mCCA can be viewed as a natural extension of PLS to more than 
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two datatypes. 

Di et al. [2009] develop multi-level functional PCA (MF-PCA) for the 
analysis of variation between and within grouped samples of functional 
data. Similar in spirit to JIVE, MF-PCA yields a sum of two PCA de- 
compositions: one for variability between groups and one for variability 
within groups. We stress that JIVE is designed for analysis across disparate 
datatypes, while MF-PCA analyzes grouped observations on the same func- 
tional datatype. Furthermore, the global component in JIVE models sim- 
ilarities across datatypes, while the global component in MF-PCA models 
differences between sample groups. 

2. JIVE. The JIVE method decomposes multi-block data into three 
terms: joint structure between datatypes, structure individual to each datatype, 
and residual noise. As in 02-PLS, accounting for individual structure can 
lead to better estimation of what is joint between multiple datatypes, and 
vice-versa. In addition, JIVE is robust to the dimensionality of the data 
(including n > p and p > n), has a simple algebraic interpretation, and can 
be applied to more than two datatypes. 

2.1. Model Let Xi,X2, be datatypes as in Section 1.1. Variation 

that is consistent across datatypes in the aggregated matrix X is represented 
by a single pxn matrix of rank r < rank(X) . This matrix represents the joint 
structure of X. For each X^, structured variation in Xi unrelated to the other 
datatypes is represented by a x n matrix of rank < rank(X^). These 
matrices represent the individual structure of each Xi. The sum of joint and 
individual structure gives a low-rank decomposition approximating the joint 
data matrix X. The general model for two datatypes Xi and X2 is shown 
in Figure 1. 

More formally, let Ai be the matrix representing the individual structure 
of Xi, and let Ji be the submatrix of the joint structure matrix that is 
associated with Xi. Then, the unified JIVE model is 

Xi = Jl + + ei 

(2.1) : 

where are piXn error matrices of independent entries with E(e^) = O^-xn- 
Let 

■ Jl ' 
J- '. , 

Jk 
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Fig 1. Depiction of the JIVE decomposition for two datasets. The data are approximated 
by a low rank matrix of joint variation and low rank matrices giving structured variation 
unique to Xi and X2 . 



denote the joint structure matrix. The model imposes the rank constraints 
rank(J) = r and rank(74^) = for i — Furthermore, we assume 

that the rows of joint and individual structure are orthogonal: jA^ — Opxpi 
for i = 1, A:. Intuitively, this means that sample patterns related to joint 
variation between datatypes are unrelated to sample patterns responsible 
for structure in only one datatype. This assumption does not constrain 
the model, in that any set of datatypes in the form (2.1) can be written 
equivalently with orthogonality between joint and individual structure. Fur- 
thermore, the orthogonality constraint assures that the joint and individual 
components in the structure of the aggregated matrix X are uniquely de- 
termined. See Appendix A.l for more details. 

2.2. Estimation. Here we discuss estimation of joint and individual struc- 
ture for fixed ranks r, ri, r/^. Section 2.4 discusses the choice of ranks. Joint 
and individual structure are estimated by minimizing the sum of squared 
error. Let R be the p x n matrix of residual noise after accounting for joint 
and individual structure: 



■ Ri ' 






-Ji-Ai 






_ Xk 


- Jk - Ak 



We estimate the matrices J and Ai, . . . , by minimizing the sum of squared 
residuals under the given ranks. This is accomplished by iteratively 

estimating joint and individual structure: 

• For fixed J, find Ai, ...^A^ to minimize 

• For fixed Ai, A^^, find J to minimize 
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• Repeat until convergence. 

The joint structure J minimizing ||i?|| for fixed individual structure is 
equal to the first r terms in the singular value decomposition (SVD) of X 
with individual structure removed. For fixed joint structure, the estimated 
individual structure for Xi is equal to the first ri terms of the SVD of 
Xi with the joint structure removed. The estimate of individual structure 
for Xi will not change those for Xj^ j i, and hence the k individual 
approximations minimize ||i?|| for fixed joint structure. Pseudocode for the 
iterative algorithm is given in Appendix A. 2. We note that the iterative 
method is monotone in the sense that ||i?|| decreases at each step. Thus 
||i?|| converges to a coordinate-wise minimum, that can't be improved by 
changing the estimated joint or individual, structure. Further convergence 
properties of the algorithm are currently under study. 

2.3. Relationship to PC A. For a row-centered p x n matrix X, the first 
r principal components yield the rank r approximation 

X ^US, 

where S{rxn) contains the sample scores and U{pi x r) contains the variable 
loadings for the first r components. 

As in PCA, the rank r joint structure matrix J in the JIVE model can 
be written as US, where U is a, p x r loading matrix and 5 is an r x n score 
matrix. Let 

" Ui 
U ^ : 
. Uk 

where Ui gives the loadings of the joint structure corresponding to the rows 
of Xi. The rank individual structure matrix Ai for Xi can be written as 
WiSi, where Wi is 3. piX Vi loading matrix and Si is an x n score matrix. 
Then, the low rank decomposition of Xi into joint and individual structure 
is given hj Xi UiS ^ WiSi. This gives the unified model 

Xi = UiS + WiSi + Ri 

(2.2) : 

Xk = UkS^WkSk^Rk- 

Joint structure is represented by the common score matrix S. These scores 
elicit patterns in the samples that explain variability across multiple datatypes. 
The loading matrices Ui indicate how these joint scores are expressed in the 
rows (variables) of datatype i. The score matrices Si elicit sample patterns 
individual to datatype z, with variable loadings Wi. 
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2.4. Rank Selection. Section 2.2 describes the estimation of joint and 
individual structure in the JIVE model for a given set of ranks r, ri, r/^. 
The choice of these ranks is important to accurately quantify the amount 
of joint variation and individual structure among datatypes. Furthermore, 
over or underestimation of what is joint can negatively effect estimation of 
what is individual, and vice- versa. Rank selection is integral to the estima- 
tion of joint and individual structure in JIVE. Indeed, rank constraints or 
another form of penalization is essential to simultaneously identify joint and 
individual variation. 

Here, we describe a two-stage permutation testing approach to rank se- 
lection. First, the total underlying rank of structured (joint and individual) 
variation is estimated for each Xi. We refer to this as the effective rank, and 
denote it by rank-eff(X^). The joint structure rank r, and individual struc- 
ture ranks are then estimated under the restriction r + = rank-eff(X^) 
for i = 1, k. 

To estimate rank-eff(X^) we use a permutation-based scheme that relies 
on the singular values of X^, which is described in Peres-Neto et al. [2005]. 
In this procedure, the first singular value of Xi is compared to the first 
singular values from several random permutations of Xi. In each permu- 
tation, columns are permuted independently within each row to maintain 
the distribution of each variable, while effectively removing between- variable 
associations. If only a small proportion (e.g. a = 0.01) of the largest sin- 
gular values under permutation are greater than the observed value, the 
observed value is deemed significant, and the associated rank one matrix is 
subtracted from Xi. This process is repeated until significance is no longer 
achieved; rank-eff(X^) is defined to be the number of significant singular 
values obtained in this way. 

Given rank-eff(X^) for each z, a similar permutation based test is used 
to determine the ranks r and ri, . . . ,rk. For estimated joint structure J of 
rank r and individual structures Ai of rank (where r + = rank-eff(X^)), 
we test for remaining joint structure in the residuals X — J. This test is 
based on permuting columns within each datatype (across all rows), which 
maintains the multivariate distibution of each datatype while effectively re- 
moving between-datatype associations. Rank(J) = r is increased until the 
estimated individual structure and residual noise X — J has no significant 
joint structure. Detailed pseudocode for this permutation-based approach 
is provided in Appendix A. 3. Alternative approaches to rank selection are 
currently under study. 
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2.5. Variable Sparsity. In many practical applications, important struc- 
ture between samples or objects is only present on a small subset of the 
measured variables. This motivates use of sparse methods, in which only 
a subset of variables contribute to a fitted model. Sparse versions of ex- 
ploratory methods such as PC A [Shen and Huang, 2008], PLS [Le Cao et al., 
2008] and CCA [Parkhomenko et al., 2009] already exist. 

Here, we describe the use of a penalty term to induce variable sparsity 
in the JIVE decomposition. Sparsity is accomplished if some of the variable 
loadings for joint and individual structure (U and Wi in Section 2.3) are 
exactly 0. For weights A and A^, we minimize the penalized sum of squares 



where Pen is a penalty designed to induce sparsity in the loading vectors. In 
our implementation, Pen() is an LI penalty analogous to Lasso regression 
[Tibshirani, 1996], namely 



Under this penalty, loadings of variables with a small or insignificant con- 
tribution tend to shrink to 0. Other sparsity-inducing penalties (e.g. hard 
thresholding) may be substituted for LI penalization. 

Estimation under sparsity penalization is accomplished by an iterative 
procedure analogous to that used for the non-sparse case: 

• For fixed J, find Ai to minimize + A^Pen(VI/^) for each i = 
1, hdots^ k. 

• For fixed Ai, ...,Aj^, find J to minimize + XPen.{U) 

• Repeat until convergence. 

At each iteration, the sparsity penalty is incorporated through the use of a 
sparse singular value decomposition (SSVD), adapted from Lee et al. [2010]. 
The weights A, A^ may be pre-specified or estimated via the Bayesian Infor- 
mation Criterion (BIC) [Schwarz, 1978] at each iteration. 

Inducing sparsity in the joint structure effectively identifies subsets of 
variables within each datatype that are associated. Examination of the joint 
sample scores, in turn, reveals sample patterns that drive these associations. 
Section 4.3 illustrates the use of sparsity in the interpretation of associations 
across disparate genomic datatypes. 



R\\^ + APen(t/) + ^A^ Pen(H^^) 
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2.6. Dimension Reducing Shortcut. For high-dimensional data, where 
Pi > computing time can be improved by reducing the dimensionahty 
of Xi, ...,Xk at the outset. Rather than working with all variables, we con- 
sider a dimension-reducing transformation of the original data: Xi 
where X^ is an n x n matrix derived from the SVD of Xi. In particular, if 

SVD{Xi) = UiKiVi 

then X^- — AiV- . Covariance and Euclidian distance between columns (sam- 
ples) of Xi are preserved in X;^. Applying the JIVE algorithm to the trans- 
formed datasets X^, . . . , X^ can be substantially faster for high-dimensional 
data. Estimated joint {J-^) and individual (A^) structure for X;^ can then 
be transformed back to the original variable space through the left singular 
vectors Ui\ Ji ^ UiJ-^ and Ai = UiA^. Applying the iterative estimation 
method to X directly or estimating joint and individual structure for X^ 
and mapping back to the original variable space will yield identical results. 
Hence, high-dimensional data is always transformed via SVD before estima- 
tion of joint and individual structure. 

3. Illustrative Examples and Simulated Data. Here we motivate 
and demonstrate the use of JIVE through three examples. In Section 3.1, 
a simple example illustrates how JIVE can identify structure not found by 
existing methods such as Consensus PCA, PLS or CCA. In Section 3.2, we 
add an artificial signal to real data from three genomic platforms in order to 
illustrate the use of JIVE to identify joint signal among multiple datatypes 
with complex structure. In Section 3.3, the iterative estimation method is 
applied to hundreds of diverse datasets from the model (2.1), in order to 
demonstrate its robustness. 

3.1. Illustrative Example. As a basic illustration we generate two matri- 
ces, X and y, with simple patterns corresponding to joint and individual 
structure. The simulated data is depicted in Figure 2. Both X and Y are of 
dimension 50 x 100, i.e., each has 50 variables measured for the same 100 
objects. A common pattern V of 100 independent standard normal variables 
is added to half of the rows in X and half of the rows in Y . This represents 
the joint structure between the two datasets. Structure individual to X is 
generated by partitioning the objects into five groups, each of size twenty. 
Those columns corresponding to group 1, 2, 3, 4, or 5 have -2, -1, 0, 1, 2 
added to each row of X, respectively. Structure individual to Y is generated 
similarly, but the groups are defined independently of the groups in X. Fi- 
nally, independent N(0,1) noise is added to both X and Y. Note that the 
important joint structure is not visually apparent. 
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Fig 2. X andY are genereted by adding together joint structure, individual structure, and 
noise. Blue corresponds to negative values, red positive values. 



The common pattern V represents an underlying phenomenon that con- 
tributes to several variables in both X and Y. Practically, the individual 
structure in X (or Y) may correspond to an experimental grouping of the 
measured variables in X (Y) not present in Y (X), e.g., batch effects in 
microarray data. Our goal is to identify both the common underlying phe- 
nomenon and individual group effects. 

Consensus PC A of the aggregated matrix [X' Y'Y does a poor job of find- 
ing the joint structure. The scatterplot in Figure 3 shows a weak association 
between the first principal component scores and the joint response V. This 
is because PCA of the aggregated data is driven by all variation in the data, 
joint or individual. 

Figure 4 shows an analysis of PLS and CCA for X and Y. The scores 
for the first pair of PLS directions show a weak association between X 
and Y (panel C). Furthermore, the PLS scores are not strongly related 
to the joint response V (panels A and B). Scores for X and Y show a 
stronger association with V within classes, indicating how for PLS individual 
structure can interfere with the estimation of joint structure. The first pair 
of CCA scores are very highly correlated (panel F), but show nearly no 
association with the joint or individual structure (panels D and E). This 
illustrates the tendency of CCA to overfit on high-dimensional data. 

Next we consider the JIVE analysis of X and Y. The permutation testing 
approach described in Section 2.4 suggests a rank one approximation for 
joint structure and rank one approximations of individual structure for both 
X and Y. Scores and loadings for the joint component and both individual 
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Fig 3. Scatterplot of the first consensus principal component scores Vs the joint signal V . 
The scores are weakly associated with the joint . 
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Fig 4. The first pair of PLS, and CCA, directions for X andY . Panels (A) and (B) show 
a weak association between the PLS scores and the joint signal V. Points are colored by 
simulated class in both X and Y , and are more highly associated with V within each class. 
The first pair of CCA directions correlate with each other (F), but not the common signal 
V (D and E). This illustrates the tendency of CCA to overfit. 
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components are shown in Figure 5. JIVE is able to find the true joint signal 
between the two datasets, as joint scores are closely associated with the 
common response V (panel A). Furthermore, individual scores do a good 
job of distinguishing classes specific to X and Y (panels D and F). The joint 
signal was added to only the first 25 variables in X and Y, which is reflected 
in the joint loadings (panels B and C). The individual classes were defined 
on all 50 variables for both X and y, which is reflected in the individual 
loadings (panels E and F). Note that joint and individual loadings are not 
constrained to be orthogonal, which gives the analysis more flexibility. 



Joint Component Individual Components 




Y loadings Y unique component 

Fig 5. Scores and loadings for joint and individual components in the JIVE decomposition. 
Joint scores are highly associated with the common signal V (panel A). Individual scores 
distinguish classes specific to X and Y (D and F). Joint loadings (B and C) show a strong 
effect (difference from zero) on half of the variables in X and Y . Individual loadings (E 
and G) show a similar effect on all variables in X and Y . 

Figure 6 shows the resulting low rank approximations for joint structure, 
individual structure and residual noise obtained by JIVE. Estimates closely 
resemble the true signal in Figure 2. 

3.2. Example Based on Genomic Data. We now use real genomic data to 
illustrate the extraction of a known joint signal from datatypes with realis- 
tic individual variation. We begin with gene expression (GE), copy number 
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Fig 6. JIVE estimates for joint structure, individual structure, and noise. Blue corresponds 
to negative values, red positive values. 

variation (CN), and microRNA (miRNA) data available for the same set 
of 234 GBM tumor tissue samples from TCGA [TCGA Research Network, 
2008]. The GE and CN data are both of dimension 14556 (expression in- 
tensity and average copy number variation for the same 14556 genes), the 
miRNA data is of dimension 535 (535 miRNA intensities measured). The 
three datasets are normalized as in Section 1.1. 

These three genomic datasets are used to simulate data with realistic 
background variation (see Figure 7). First, samples (columns) are randomly 
permuted separately within each dataset. This effectively removes joint struc- 
ture between datatypes, but maintains the structure within each datatype. 
An artificial joint signal is then added to 5% of the rows in each of the 
three datatypes. This joint signal is generated by adding a constant equal to 
the row standard deviation to the first half of the samples, and subtracting 
the same constant from the other half. This yields two sample clusters that 
are common to each of the three datatypes. We would like to identify this 
common signal among the variation individual to each datatype. 

The first Consensus PC of the three datasets is shown in Figure 8. It shows 
a slight association between the two joint clusters, but does not clearly sep- 
arate them. In this case, the joint signal does not dominate the variation in 
the data, and is hard to extract without accounting for individual structure. 
Application of PLS and CCA is complicated by the fact that these methods 
are designed for two datatypes. Moreover, the pairwise applications of these 
methods is also negatively affected by individual structure and over-fitting. 

A JIVE analysis of the data successfully distinguishes joint and Individ- 
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Fig 7. Simulation based on miRNA, GE and CN data. Columns are permuted within each 
datatype so that samples are not associated. A common signal is then added to 5% of the 
rows in each datatypes. 



Principal Component 1 
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Fig 8. Scores for the first Consensus PC of the aggregated data, colored by cluster. The 
two clusters (representing common signal) are not well distinguished. 

ual structure. The permutation approach described in Section 2.4 suggests 
estimating rank 1 joint structure, and rank 39, 35, and 13 individual struc- 
ture for GE, CN and miRNA, respectively. Sample scores and GE, CN and 
miRNA loadings for the joint component are shown in Figure 9. Joint scores 
now clearly separate the two artificial clusters. Furthermore, loadings clearly 
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indicate which rows contain the joint signal in each of the three datatypes. 
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Fig 9. Joint component scores and loadings. Scores are colored by artificial cluster and 
show good seperahility. Loadings for CN, GE, and miRNA are ordered so that the first 5% 
have joint signal added, and these show the strongest contribution (difference from 0). 



3.3. Extensive simulations from model. To test the robustness of the 
iterative estimation method, we apply it to a diverse set of 200 randomly 
generated models. In our simulations, two data matrices Xi and X2 are 
generated with varying sample size and dimensions. Joint and individual 
structure are generated from different probability distributions, and have 
varying ranks. These randomly generated models are tested both with and 
without noise. 

We first test models with joint and individual structure, and no additional 
noise. The sample size n and dimensions pi and p2 for Xi and X2 are drawn 
at random uniformly from {10,11,...,100}. The rank of joint structure, r, and 
individual structure, ri and r2, are each drawn from {0,1,..., 4}. Low-rank 
structure is then generated in factorized form, as in (2.2): 

Xi = UiS + WiSi 

X2 = U2S^W2S2. 

For each realization of Ui,U2, Wi, W2, Si, and S2, the entries are generated 
from a random choice among the distributions N(0, 1), Uniform(0, 1) and 
Bernoulli(^). 



JOINT AND INDIVIDUAL VARIATION EXPLAINED 



17 




10 20 30 40 50 60 70 80 90 100 

Simulation 



Fig 10. Sum of squared residuals in the simulated model (green), the fitted model with 
true ranks (blue) and the fitted model under permutation testing (red) in 100 randomly 
generated simulations. 

We applied the iterative estimation method to Xi and X2 for 100 simu- 
lated examples, generated as described above. In all cases, the fitted model 
with the true ranks r, ri and r2 converged to the simulated data Xi and 
X2 {\\X — X|p < 10"-*^^). This is evidence that, in the absence of noise, low 
rank joint and individual structure can be recovered exactly. 

We then included error in our simulations, using the model 

Xi = UiS + WiSi + Ei 

X2 = U2S^W2S2^E2, 

where Ui,U2, S, Wi, W2, Si, S2 are generated as above, and Ei, E2 are error 
matrices with independent entries from A^(0, a^). The standard deviation, a, 
of the noise is randomly determined from a Uniform(0, 2) distribution. In 100 
randomly generated examples, joint and individual structure are estimated 
given the simulated ranks r,ri and r2, and also with ranks estimated by 
the approach described in Section 2.4. We expect the sum of squared error 
+ ||£'2|p to be close to the sum of squared residuals after estimating 
joint and individual structure, + ||^2|P • 

Figure 10 plots the sum of squared errors in the simulated model and 
the sum of squared residuals from the fitted model, both with and without 
estimated ranks, for the 100 examples. When the simulated ranks are used. 
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the sum of squared residuals for the fitted model is always smaller than the 
sum of squared error in the simulation. This is evidence that the iterative 
approach is successful in minimizing the sum of squared residuals. Estimated 
ranks for joint and individual structure agree completely with the simulated 
ranks in 60% of simulations. The permutation testing approach tends to 
underestimate the simulated ranks, if noise overwhelms the low rank signal. 

4. Application to Genomic Data. Here we exhibit how JIVE can 
be used to explore global associations across multiple genomic datatypes. 
We examine a set of 234 GBM tumor samples. For each tumor sample, 
535 miRNA intensities and 24,350 gene expression intensities are available. 
These data are publicly available from The Cancer Genome Atlas (TCGA) 
[TCGA Research Network, 2008]. The pre-processed data used for this anal- 
ysis is available at https://genome.unc.edu/jive. Verhaak et al. [2010] 
classified these GBM samples into four gene expression subtypes: Neural, 
Mesenchymal, Proneural and Classical. These subtypes have distinct ex- 
pression characteristics, copy number alterations, and gene mutations. In 
addition, there were clinical differences across subtypes in response to ag- 
gressive therapy. The classification of GBMs into these subtypes may be 
exploited for more targeted therapy. 

Copy number abberations and somatic mutations, and their relationship 
with gene expression, have been recognized as important aspects of GBM 
biology (see, e.g., Bredel et al. [2009] and TCGA Research Network [2008]). 
However, the role of miRNA in GBM biology has not been well studied. We 
applied JIVE to the miRNA and gene expression data in order to identify 
joint and individual variation among the two datatypes, and we further 
investigated how this variation is related to the GBM subtypes. 

4.1. Quantifying joint and individual variation. Permutation testing (Sec- 
tion 2.4) was used to determine the ranks of estimated joint and individual 
structure. The test (using a = 0.01, and 1000 permutations) identified 

• rank 5 joint structure 

• rank 7 structure individual to miRNA 

• rank 27 structure individual to gene expression. 

The percentage of variation (sum of squares) explained in each dataset by 
joint structure, individual structure, and residual noise is shown in Figure 
11. This illustrates how the JIVE decomposition can be used to quantify and 
compare the amount of shared and individual variation between datatypes. 
As shown in Figure 11, joint structure is responsible for more variation in 
miRNA than in gene expression (31% and 16%, respectively), and the gene 
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expression data has a considerable amount of structured variation (53%) 
that is unrelated to miRNA. This is consistent with current biological un- 
derstanding, as the primary function of miRNA is thought to be modifica- 
tion of gene expression, whereas gene expression is involved in many more 
biological functions. 




■ Gene Expression 



Joint Individual Residual 



Fig 11. Percentage of variation (sum of squares) explained by estimated joint structure, 
individual structure and residual noise for miRNA and gene expression data. 



4.2. Sample distinctions on joint and individual structure. Sample scores 
for joint and individual structure, matrices S and Si in equation 2.2, reveal 
sample patterns that are present across datatypes, and patterns that are in- 
dividual to each datatype. Figure 12 shows separate scatterplots of the sam- 
ple scores for the first two principal components of estimated joint structure, 
the first two components individual to miRNA, and the first two components 
individual to gene expression. All four subtypes are clearly distinguished in 
the scatterplot of joint scores, but a subtype effect is not visually apparent 
in either of the individual scatterplots. 

Since the subtypes are defined by gene expression clustering, their ap- 
pearence in Figure 12 is not surprising. However, the clustering apparent in 
the joint plot shows involvement of miRNA in the differentiation of these 
subtypes. It is interesting that a subtype effect is not apparent in either 
scatterplot for individual structure, suggesting that this variation is driven 
by other biological components. This is remarkable, as the fraction of gene 
expression variation explained by joint structure (see Figure 11) is small. 

To numerically compare the extent to which subtype distinctions are 
present, we considered the variability within subtypes (across all rows) as 
a proportion of total variability. Table 1 gives SWISS scores (Standard- 
ized Within subtype Sum of Squares) for the gene expression and miRNA 
data, SWISS scores for the JIVE estimates of joint and individual structure, 
and SWISS scores for the JIVE estimates of joint and individual struc- 
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Fig 12. Scatterplots of sample scores for the first two joint components, first two individual 
miRNA components, and first two individual gene expression components. Samples are 
colored by subtype: Mesenchymal ( ), Proneural (blue). Neural (green) and Classical 
(red). 



ture with sparsity. Sparsity is enforced as described in Section 2.5, and the 
weight parameter A is determined via the Bayesian Information Criterion. 
A permutation test described in Cabanski et al. [2010] concludes that the 
four subtypes are significantly more distinguished on the estimated joint 
structure, both with and without sparsity, than on the gene expression and 
miRNA data {p < 0.001). SWISS scores for individual structure in gene 
expression and miRNA are close to one, as differences between subtypes are 
almost entirely represented in the joint structure between the two datatypes. 
This suggests that miRNA may play a greater role in GBM biology than 
previously thought. 

In general, these analyses illustrate how an unsupervised, integrated anal- 
ysis across multiple datatypes can result in a better distinction between 
subtypes or other biological classes. Note that one could carry out a similar 
analysis to investigate how the JIVE components relate to survival or other 
clinical factors, rather than subtype. Furthermore, a direct cluster analysis 
on the JIVE components could be used to identify sample groups that are 
distinguished across multiple datatypes. 

4.3. Exploring gene-miRNA associations through sparsity. A natural way 
to explore associations between individual genes and miRNAs is to compute 
the matrix of all gene-miRNA correlations, and then examine the set of 
significant correlations. A heat map showing significant gene-miRNA corre- 
lations is shown in Panel A of Figure 13. 

A sparse implementation of JIVE provides an alternative approach to 
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Table 1 

SWISS scores for TCGA subtypes. Lower scores indicate more subtype distinction. 



Data 


Gene expression 
miRNA 


0.8431 
0.8763 


JIVE 


Joint 

Gene expression individual 
miRNA individual 


0.7952 
0.9613 
0.9698 


JIVE with sparsity 


Joint 

Gene expression individual 
miRNA individual 


0.6797 
0.9800 
0.9865 



identifying gene-miRNA associations, and can reveal additional structure. 
Panel B shows the sample scores in the first joint component resulting from 
a sparse JIVE analysis of the data. Panel C shows all the gene-miRNA pairs 
with the property that both that gene and miRNA have non-zero loadings 
in the first joint component. Thus, the non-zero entries of the heat map 
have the form of a Cartesian product. We note that the non-zero entries in 
Panel C closely match those in the correlation map of Panel A, and that 
the signs of these entries also show good agreement. Scores for the first joint 
component (Panel B) distinguish the Mesenchymal and Proneural subtypes, 
suggesting that differences between these sample groups are driving the first 
joint component, and appear to influence the correlation structure of the 
data as well. 

Panels D and E display sample scores and non-zero loadings for the sec- 
ond joint component. Panel D shows that the second joint component distin- 
guishes the Neural and Classical subtypes. We note that Panel E is markedly 
different from Panel A, indicating that the second joint component is cap- 
turing associations between the expression of genes and miRNA that are not 
immediately apparent from the consideration of correlations alone. Indeed, 
these associations appear to be masked by variation captured in the first 
joint component. 

5. Summary and Discussion. TCGA and similar projects are provid- 
ing researchers with access to an increasing number of multi-block datasets. 

However, there are relatively few general statistical methods for the analysis 
of such integrated datasets. The unique features of JIVE provide a powerful 
new approach to analyzing multi-block data. JIVE finds both the coordi- 
nated activities of multiple datatypes, as well as those features unique to 
a particular datatype. We demonstrate how accounting for joint structure 
can lead to better estimation of individual structure, and vice-versa. Our 
application of JIVE to multi-block data on GBM tumor samples has pro- 
vided better characterization of tumor types, and better understanding of 
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Correlations Component 1 Component 2 



A 




Fig 13. Plot of gene-miRNA correlations (A), and scores and loadings for the first two 
sparse joint components (B-E). In (A), gene-miRNA pairs are colored red if they have 
a significant positive correlation and blue if they have a signficant negative correlation 
(P < 10~^/ Panels (B) and (D) show sample scores for the first two joint components, 
colored by subtype. Panels (C) and (E) display gene-miRNA pairs where each have non- 
zero loadings. Pairs are colored red if both gene and miRNA loadings have the same sign, 
blue otherwise. In panels (A), (C) and (E) genes and miRNAs are ordered separately by 
average linkage correlation clustering. 

the biological interactions between the given datatypes. 

While this paper focuses on vertically integrated biomedical data, the 
JIVE model and algorithm are very general and may be useful in other con- 
texts. A similar approach can be applied to horizontally integrated data, in 
which disparate sets of samples (e.g. sick and healthy patients) are available 
on the same datatype. In finance, JIVE has the potential to improve on 
current models that explain variation across and within disparate markets 
(see Bekaert et al. [2009]). These applications are currently under study. 

APPENDIX A 

A.l. Notes on Orthogonality. Enforcing orthogonality between joint 
and individual structure in the JIVE decomposition does not constrain the 
solution, and is a sufficient statement of condition for uniqueness of each 
component under mild assumptions. Here we formally present and prove 
this result. 
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Theorem A.l. Let 



T = 



Ti 




Jl 




Ai 








+ 




Tk 








Ak 



where rank{Ti) = rank{J) + rank{Ai) Vi. Then^ 
1. There exists J-^ , A-^ such that 













\Ai] 


Ji 




Ak 




+ 











T = 



rank{J^) = rank[J), rank[Aj^) = rank[Ai) Vi^ and J-^[A-^)' = Opxp- 
2. //row(Ai) n . . . n row(A/c) = 0^ then and A-^ in (1) are uniquely 
defined. 

Proof. 



1. Let Pj define the projection matrix onto the row space of J, row(J). 
For i = 1, . . . , A;, define 

jt = Ji + AiPj and Ai = Ai{I-Pj). 

Then, + Aj- = Ji + AiPj + Ai - AiPj = J^ + Ai Vi, and hence 





' Jf 




' Ai- 


T = 




+ 













Furthermore, 

J^A^' = J^{I- PM"-' = (J^ - J^)A^' = Opxp. 

Note row(J^) C row(J), so rank(J^) < rank(J). 

Also rank(A^) < min{rank(A^), rank(/ - Pj)} < rank(A^). 

Hence, as 

rank( J) + rank(A^) = rank(r^) = rank( J-^) + rank(A^), 
rank( J-^) = rank( J) and rank(A^) = rank(A^) Vi. 
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2. Assume 



r = 



J 


A 


' Jl ' 






J2 


+ 


A2 









where rank(J) = rank(J), rank(A^) = rank(A^) Vi, and J {Ay = O^xp- 
Then, 

rank(J^) = rank(T^) — rank(74^) = rank(J). 
So, as row(J^) C row(J) Vz, 

row(Ji) = . . . = row(Jfc) = row(J). 

Note that 

Ji = JiPj:^- + JiPA^ 

i i 

We will show JiPai. = 0. 

i ^ 

First, take d G row( J^P4_l). Then, for any j = 1, fc, 

i 

x(m{dA^) C row(J) C xcm^Tj) = row(J-^) U row(Aj-), 
hence there exists aj , bj such that 

a'jJ^ + b'jAf = c'Al 

^a'jJ^ = c'Ai-b'jAf 
-^a'jJ^ = 0, 
since row(J-'-) ± {row(^^) Urow(Aj-)}. So, 

b'^Af = c'Ai Vj. 

Note row(74^) C row(74^)Vz, so by assumption 

icow{A^) n . . . n row(A^) = and hence row(c'74^) = 0. 

So JiPa^ = 0, and hence Ji = JiPj±^ Vi. So we conclude 

i 

row (J) C row(J^), and by an analogous argument row(J^) C row (J). 
It follows that = Ji Vi since 

TiPj^ = {J,^ + Ai)Pj^ = Jl- 

and 

TiPj^ = {fi + Ai)Pj = Ji. 
Furthermore, Af- = Ti - = Ti - Ji = AiMi. □ 
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A. 2. Pseudocode: Estimation. Pseudocode for the iterative estima- 
tion procedure described in Section 2.2: 

• Initialize X^^^^^ = X = [Xi^.X^Y 

• Loop: 

- Give J = UAV^ by rank r SVD of X^^^^^ 

- For i = 1, k: 

* Set Xy^iq^^ = X, - U,S 

* Give Ai = WiAV^ by rank n SVD of x^^'^^^'^^il - VV^), 
set Si = AF-^ 

* Set x/^i^^ = X^ - H^i^i 

- Set xJoi^t = [xJoi^t...xJoint]/ 

Note that the orthogonahty constraint is imposed in the estimation of indi- 
vidual structure. 

A. 3. Pseudocode: Rank Estimation. Psuedocode for the two-stage 
permutation based approach to rank selection described in Section 2.4, for 
given number of permutations n_perm and significance threshold a. 

To estimate rank-eff(X^): 

1. Initialize rank-eff(X^) = 

2. Determine the first singular value of X^, di{Xi). 

3. Determine the first singular value, after permuting columns within 
each row of X^. Repeat n_perm times. 

4. Test if the proportion of singular values after permutation greater than 
di{Xi) is less than a. 

5. If significant, remove variation due to 6/i(X^), set rank-eff(X^) = rank-eff(X^) + 
1, and repeat from Step 2. 

To estimate r, ri, . . . , r^: 

1. Initialize r = 

2. Let J, Ai, be the JIVE estimates to X, where rank(J) = r and 
rank(A,) = r, (r + n = rank-efr(Xi)). Set X^ = X, - Ji Vi. 

3. Test for remaining joint structure between X|*, ...,X^. 

• Determine the total variation explained by fitting rank 1 joint 
and rank r — 1 individual structure to [Xf . . . X^]^ 

• Determine the total variation explained by fitting rank 1 joint 
and rank — 1 individual structure, after permuting columns 
within each Xf. Repeat n_perm times. 
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• Test if the proportion of models on the permuted data that ex- 
plain more variability than the original model is less than a. 

4. If significant joint structure remains, set r = r + 1 and repeat from 
Step 2. 
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